Assembly Language
©
Copyright Brian Brown, 1988-2000. All rights reserved.
| Notes | Home Page |
This module is the individual work of Brian Brown. It may not be copied or used in any form without his permission.
OBJECTIVE
The study of advanced micro-processor architectures will
aid the student in their understanding of complex systems and enable effecient
software production.
The Intel 80x86 Processor Family
The Intel 80x86 family looks like
Processor Address Bus Size Data Bus Size Initial Clock Rate 8088 20 (1Mb) 8 4.77Mhz 8086 20 16 4.77Mhz 80286 24 (16Mb) 16 8-12Mhz 80386 32 (4Gb) 16 16-33Mhz 80486 32 (4Gb) 32 33-66Mhz
iAPX386
The programming model is increased over the iAPX88 to
include,
The on-chip instruction queue is 16 bytes long. As the average instruction is 3.2 bytes long, up to 5 instructions will be pre-fetched. The iAPX386 has a 3 stage internal execution pipeline for decoding and executing instructions.
Long integers are now supported (32 and 64 bits). Also added was
Virtual memory support is enhanced by the use of instruction continuation. This allows an instruction to be restarted at a later date, in the event of the instruction not being able to continue.
The processor runs in several states, determined by the IOPL level in the status register. Extra protection is provided by use of descriptor tables on a task by task basis, which describe their priviledge level, and access rights (execute, read only, read write). Each task can also have an I/O bit permission map associated with it, used to determine which ports it can access. In addition to this, the paging unit assigns privileges and access rights to each page.
The use of an internal memory management unit imposes no delay in performing the address translation, thus the system runs with a four stage bus cycle.
With 32 bit data/address busses, the iAPX386 addresses 4 gigabytes of physical memory. At a clock rate of 30mHz, the cycle time is 33.3ns. The processor supports address pipelining. This makes available the address of the next bus cycle on the address bus before the end of the current cycle (one clock cycle earlier). The advantage is that it gives a greater address hold time to memory, allowing zero wait state running where one wait state might've been necessary.
The 80386 programming model
The iAPX386 processor is housed in a
132 pin grid array package, and manufactured using the CHMOS III process. It
operates at speeds up to 33mhz with instructions rates up to 8 mips. The
programming model implements paging and multi-tasking.
CPU Registers
General Purpose AX general data, results of operations, multiply/divide BX general data, reference variables on stack CX general data, loop count instructions, rotate instructions DX general data, multiply/divide instructions Segment Registers CS code DS data ES extra data FS/GS SS stack String Registers SI source index DI destination index Base Pointer Registers SP stack pointer BP general purpose base pointer IP instruction pointer
CARRY FLAG - Set by arithmetic instructions which generate either a carry or
borrow
PARITY FLAG - Set by most instructions if the least significant bits
of the destination operand contain an even number of 1's.
AUXILARY FLAG Set
if there is a carry or borrow involving bit 4 of EAX
ZERO FLAG Set by most
instructions if the result is binary zero
SIGN FLAG Most operations set this
bit the same as the most significant bit of the result
TRACE FLAG Permits
single stepping of programs. After executing a single instruction, the processor
generates an internal exception 1.
INTERRUPT FLAG when set, the processor
recognises external interrupts on the INTR pin.
DIRECTION FLAG Set and
cleared using the STD and CLD instructions. It is used in string processing
OVERFLOW FLAG Most arithmetic instructions set this bit, indicating that the
result was too large to fit in the destination
INPUT/OUTPUT PRIVILEDGE LEVEL
FLAGS Used to protected mode to generate four levels of security
NESTED TASK
FLAG Used in protected mode, when set, it indicates that one system task has
invoked another via a CALL instruction, rather than a JMP.
RESUME FLAG Used
by the debug registers DR6 and DR7. It enables you to turn off certain
exceptions whilst debugging code.
VIRTUAL 8086 MODE FLAG Permits 80386 to
behave like a high-speed 8086.
The nested task flag is set to indicate that this task is nested inside another task. This means that the task state segment has a back link to the previous task.
The three bit IOPL determines the privledge level of the currently running task. The processor can use this to enforce security and protection amongst processes and peripheral devices.
PROTECTED MODE REGISTERS
The descriptor registers are used to
invoke privledges for tasks and construct pointers to system tables which hold
information about stacks, tasks, exception vectors etc.
GLOBAL DESCRIPTOR TABLE REGISTER
Points to a general purpose table
of segment descriptors, and can be used by all programs to reference segments of
memory. It is mandatory for protected mode operation.
INTERRUPT DESCRIPTOR TABLE REGISTER
Points to a table of segment
descriptors that define interrupt or exception handling routines. This replaces
the interrupt vector table of the iAPX86.
LOCAL DESCRIPTOR TABLE REGISTER
Holds the address of a per-task
table of segment descriptors. Each task can be assigned its own LDT, or share
other tasks LDT's to support multi-tasking features (shared data, code).
TASK REGISTER
Identifies the currently executing task.
CONTROL, DEBUG, TEST REGISTERS
The six debug registers provide
on-chip support for debugging. Up to four linear breakpoints can be specified,
register DR7 is used to control breakpoints, whilst DR6 holds the current state
of the breakpoints.
The test registers control the operation of the on-chip translation lookaside buffer. In paged mode, the iAPX386 caches the base address of the most recently used pages in an internal buffer. This speeds up memory accesses, as the processor does not need to access the page tables in memory to find out where pages are located in system memory (except for those not cached).
When the processor accesses a page whose base address is not in the TLB, the base address of the page is obtained from the page tables in system memory, and at the same time the TLB entries are updated. Whenever the CR3 register value is altered, the TLB should also be flushed to ensure it reflects the current page table values.
EXCEPTIONS
Hardware Interrupts
Software Exceptions
EXCEPTION TYPE ErrorCode 0 ÷0 Fault 1 Debug Fault/Trap 2 NMI Trap 3 BreakPoint Trap 4 Overflow Trap 5 Bound Fault 6 Invalid Opcode Fault 7 Device not available Fault 8 Double Fault Abort Yes 9 Coprocessor device Abort 10 Invalid TSS Fault Yes 11 Segment not present Fault Yes 12 Stack Fault Yes 13 General Protection Fault Yes 14 Page Fault Yes 15 Reserved Trap 16 Coprocessor error Fault 17-31 Reserved 32-255 Maskable Interrupts and 'INT n' vectors
MEMORY MANAGEMENT
is a general term which includes all the various
techniques by which an address generated by the CPU is translated into the
actual address of the data in memory.
The address generated by the processor is known as a logical address, as is associated with the address seen by the programmer( or referenced by the program).
The actual memory supplied in a computer system is known as physical memory, and accepts a physical address.
The purpose of address translation is to change the logical address generated by the processor into the physical address for the system memory.
The iAPX386 can access 4Gbytes of logical memory. Using address translation techniques, a program could think it was executing within the top 64k block of this address space, but in reality be executing within the lowest 64k block.
The principle of address translation is important for multi-user systems, which need to relocate the user tasks anywhere in available memory. It isolates users from depending upon certain memory addresses and requirements.
Address translation is done via a lookup table. The address coming in from the processor serves as an entry into the table, the contents of the table entry then specifies the physical address. If the entire address bus is used, the table would require 2^32 entries. To reduce the size of the table, a portion of the address bus (a group of high order address lines) is used as an entry into the table. The table entry value is then combined with the rest of the address bus and presented to physical memory. The high order bits are chosen as a table entry value because they change less often whilst a program is running, thus are a good candidate for caching.
Note how the table entry specifies a base address, whilst the low order bits of the virtual address specify an offset value from that base address.
The limits of a base address plus the range of an offset address value is called a segment or page (base+0 to base+max_offset_value).
The number of virtual address bits used to form an entry into the table determines the segment/page size (10bits = 1024bytes segment).
In the majority of systems, the logical and physical addresses are different.
In simple 8bit micro-processor systems, the logical address may have a 1:1 correspondence with the physical address (6802, Z80, 8088).
SEGMENTATION
Segmentation is a technique which involves having all
the programs code and data resident in RAM at run-time. For a given system, this
limits the number of programs that can be run simultaneously. Segment sizes can
differ from program to program, which means the operating system must employ
considerable time dedicated to managing the memory system.
The most common problem associated with segmented memory is fragmentation. This occurs when running programs release their segmented space, but this space is spread out over the entire address range. Thus, there could be 1mb of free RAM, but is consists of many small blocks scattered over a 4mb address range. A program requiring 500k to run could not be loaded, as segmentation requires the memory to be contigous (one large block). In this instance, the program could not run even though there is sufficient memory.
To overcome this defiency, the operating system employs a technique called compaction, which involves relocating existing segments so as to combine all the small free blocks into larger blocks, enabling waiting programs to be run.
REAL MODE AND PROTECTED MODE
The 80386 powers up in real mode. This
means it acts and calculates memory addresses just like a 8086 processor. Memory
access is restricted to the lower 1Mb after an initial jump instruction.
To gain access to all available memory, and enable all the sophisticated features of the 386, it must run in protected mode. This gives access to 4Gb of memory, virtual memory, separation of user tasks, protection between tasks, privileged instructions, newer 32 bit instructions and a wide range of other features.
The 386 has special registers provided to enable protected mode, and these (along with system tables which define each task to be run), must be set up prior to switching the processor into protected mode.
HOW SEGMENTATION WORKS IN REAL MODE
An 8086 real mode program is
split into a minimum of THREE segments. A segment is a block of memory
referenced by a segment register within the processor.
A segment has a maximum size of 64K. The THREE segments are named DATA, CODE and STACK, and are referenced by the DS, CS and SS registers respectively. Segments may also overlap in memory, and be of different sizes.
The STACK segment handles instructions like CALL, PUSH, POP and interrupt processing.
The CODE segment is used to store instructions (programs).
The DATA segment is used to hold variables and shared data.
Instructions work with one of the defined segments. For example, the instruction
MOV AX, 200
moves the constant value 200 from the code segment into the AX register.
Which segment does the following instruction reference?
MOV AX, [200]
Which segment does the following instruction reference?
PUSH AX
Which segment does the following instruction reference?
MOV AX, ES:[200]
GENERATING THE PHYSICAL ADDRESS
Lets now look at how the actual
address is generated. As explained, all references to memory are relative to a
segment register.
The 8086 uses a 20 bit address bus to generate an address in the range 00000-FFFFF. Yet, all registers in the 8086 are 16 bit registers. The question is,
HOW is a 20 bit address generated from 16 bit registers????
ANSWER: All memory references consist of using TWO REGISTERS!
A segment register defines the base address, and another register is used to specify an offset. For example, the instruction
MOV AX, [02]
means move the value stored in memory location 02 (relative to DS) into the AX register. Lets assume that the register values and memory contents look as follows,
1. LEFT SHIFT THE SEGMENT REGISTER BY FOUR BITS BY ADDING ANOTHER 0 DS = 0010 DS = 00100 2. ADD THE OFFSET 00100 + 02 = 00102 Thus the actual memory location referenced is 00102, thus AX ends up with the value 34. CODE INSTRUCTIONS ARE REFERENCED USING CS:IP STACK IS REFERENCED USING SS:SP or SS:BX
PAGING
A logical address can be split into page numbers and offsets
within a page. Paging breaks memory into a number of fixed size blocks (normally
about 4k). A running program would normally have 2 to 3 pages of its program in
memory, as it has a small locality of reference. This means that programs
normally spend most of their time in a small portion of their space (like
waiting for keyboard entry).
Such a technique will allow more programs to be stored in memory for a given RAM size than segmentation. The remaining pages belonging to a program are stored on disk, and normally loaded when referenced by the running program.
When a running program accesses a memory location outside of its current pages, a page fault occurs. This causes a processor exception, so the operating system then
This is called demand paging, as a new page is brought into memory when a page fault occurs. Each page has a modified bit associated with it, and the operating system uses this bit to determine whether it should be written back to disk when being swapped out.
Each page also has several bits which specify its age, and when determining which page to swap out, the LEAST RECENTLY USED page is chosen. This is called the LRU algorithm.
Paging systems also employ other paging algorithms, another being anticipatory paging. This involves trying to anticipate which pages might be needed by a running program in the near future, and pre-loading them into RAM in order to reduce the overhead incurred by page faults.
iAPX386 Segmentation
When running in protected mode,the various
segment registers are internally cached into segment descriptors. These
descriptors are NOT programmer visible, but their contents are automatically
loaded by the processor from contents of the segment registers and the
descriptor tables pointed to by GDTR and LDTR. All memory accesses use these
internal segment descriptors.
The internal segment descriptors are used to provide fast checking and address calculations. If the descriptors were not used, every reference would need to access the decriptor entries stored in the GDT/LDT tables pointed to by GDTR and LDTR. Note that these internal cache descriptor registers are updated when the segment register contents are altered.
The following table illustrates address translation in protected mode. The segment register value points to an entry in an operating system table, which holds the base address and access rights related to that task. A base address derived from the table is combined with the offset portion to form the logical address.
There are THREE operating system tables used in the segmentation scheme.
Each table holds up to 8192 entries. The address of each table is held in a processor register, GDTR, IDTR and LDTR respectively. The instructions LGDT, LIDT and LLDT are used to load the address of the tables into these registers. A segment cannot be accessed by a task unless there is a corresponding entry in the GDT/LDT. Each entry in the table is eight bytes long, and is called a segment descriptor.
This means that valid segment register values are (8, 10h, 18h, 20h etc). Note that entry value 00h is not used. A program loads the segment registers with the correct entry value to point to its associated segment descriptor.
Segment Registers in Protected Mode
How do we find out the current priviledge level ????
mov ax, cs and ax, 03h
Calculation of the segments physical address
The global descriptor table contains descriptors which are normally available to all tasks in the system. Generally, these are tasks used by the operating system.
The local descriptor table contains descriptors associated with a given task. The operating system assigns each task a separate LDT. The table provides a mechanism for isolating a given tasks code and data segments from the rest of the operating system or other tasks.
A segment cannot be accessed by a task if its segment descriptor does not exist in either a LDT or GDT.
The basic format of a segment descriptor is,
If the descriptor is a code or data descriptor, it looks like,